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Abstract 

We study ECS, a variant of Exact Cover which is equivalent to 
Positive l-in-3 SAT. Random instances of EC3 were recently used as 
benchmarks for simulations of an adiabatic quantum algorithm. Em- 
pirical results suggest that ECS has a phase transition from satisfi- 
ability to unsatisfiability when the number of clauses per variable r 
exceeds some threshold r* « 0.62 ± 0.01. Using the method of differ- 
ential equations, we show that if r < 0.546 w.h.p. a random instance 
of ECS is satisfiable. Combined with previous results this limits the 
location of the threshold, if it exists, to the range 0.546 < r* < 0.644. 



1 Introduction 

Numerous constraint satisfaction problems are believed to have a "phase 
transition" in the random case when the ratio r of clauses to variables crosses 
a critical threshold r*: that is random formulas are w.h.p. satisfiable if 
r < r*, and w.h.p. unsatisfiable if r > r*, in the limit where the number of 
variables n tends to infinity. For 3-SAT, for instance, this ratio appears to 
be roughly 4.27; see pj for a review. 

In this paper we study a similar phase transition in a variant of Exact 
Cover known as ECS [21 [3]. An instance of Exact Cover consists of a set 
S = {ai, a2, . . . am} and a family of subsets oi S, F = 52, . . . Sn\- The 
problem is to determine whether there is a subfamily C F such that each 
element in S is contained in exactly one Si G C. In ECS, each ai & S is 
restricted to appear in exactly three of the subsets Si € F. 

ECS can be formulated as Positive 1-in-S SAT. Here we have a set of 
boolean variables V = {fi, f2, • • • fn} and a set of clauses C = {ci, C2, . . . Cm}, 
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where each Ci C V and |cj| = 3. Note that the variables appear as positive 
Uterals only. A clause is satisfied when exactly one of its variables is true. 
The problem is to determine whether any of the 2"' truth assignments sat- 
isfies every clause in C. An instance of ECS can be transformed into an 
instance of Positive l-in-3 SAT by setting Vi <— Si and Cj <— {vj \ ai E Sj}. 
In what follows we refer to clauses and variables rather than sets and covers. 

We conjecture that ECS possesses a phase transition at some density 
r*, where we construct random formulas with m = rn clauses by choosing 
uniformly from among the (3) possible clauses with replacemment. We note 
that techniques of Friedgut [15] can be used to show that a non-uniform 
threshold exists, i.e., a function r*(n) exists such that, for any e > 0, random 
formulas are w.h.p. satisfiable if r < (1 — e)r*(n) and w.h.p. unsatisfiable 
if r > (1 + e)r*(n). Interestingly, for 1-in-A; SAT where variables can be 
negated, Achlioptas, Chtcherba, Istrate and Moore [3] established rigorously 
that a threshold exists, at r* = 1/(2)- 

Knysh, Smelyanskiy and Morris [5] showed that random ECS formulas 
are w.h.p. unsatisfiable if r > 0.644, establishing an upper bound r* < 0.644 
if the transition exists. Our main result establishes the lower bound r* > 
0.546. Formally: 

Theorem 1 Let (f) be a ECS formula consisting of m = rn clauses chosen 
uniformly with replacement from the (3) possible clauses. If r < 0.546, 
limn^ooPi^[4' is satisfiable] = 1. 

Our proof uses the method of differential equations [6] to show satisfiability 
with positive probability. Satisfiability w.h.p. then follows from the non- 
uniform threshold referred to above. 

In addition to the transition phenomenon, our motivation is partly that 
Farhi et al. recently simulated a quantum adiabatic algorithm [7J on random 
instances of ECS. They were only able to simulate this algorithm on small 
numbers of variables (up to 17), but, in this range, the algorithm appeared 
to work in polynomial time on formulas with a variety of values of r. This 
is exciting given that ECS is NP-complete. On the other hand, van Dam 
and Vazirani [8] showed that such algorithms cannot succeed in polynomial 
time in the worst case, suggesting that either the experiments in [9] do not 
capture the asymptotic behavior of the algorithm, or that random formulas 
are considerably easier than worst-case ones. 
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2 The lower bound 



In this section we prove Theorem [T] using the technique of differential equa- 
tions. Before delving into the proof, we first describe the mechanics of 
setting variables in an ECS formula. We call clauses of length i in the for- 
mula "i-clauses". 1-clauses are also called unit clauses. Setting a variable 
V FALSE replaces each 3-clause ti = {v,Xi,yi} it appears in with a 2-clause 
Xi (B Vi, and replaces each 2-clause 6j = f © it appears in with a positive 
unit clause Zi. Similarly, setting v true replaces each 3-clause it appears in 
with two negative unit clauses xl^yl^ and replaces each 2-clause it appears 
in with a negative unit clause zj. 

We analyze a simple greedy algorithm which is a variant of Unit Clause 
resolution or UC for short [12]. Algorithms based on UC have so-called 
"free" and "forced" steps. A free step is one in which the algorithm decides 
on a variable and the value to which that variable is set. Forced steps result 
from unit propagations, i.e., repeatedly satisfying all unit clauses until none 
are left. Two of the common ways to choose the variable on the free step 
are 

1. choose a variable at random, 

2. for a fixed i, choose an i-clause at random, then choose a variable at 
random from one of the i variables in the clause. 

We obtained the best lower bounds by using method 2, known as Short 
Clause or SC, and always setting the chosen variable to true. Our algorithm 
is shown in table 1. 

We call each iteration of the outer while loop, i.e., a free step followed 
by a series of forced steps, a round. Since resolving a unit clause creates 
more unit clauses, the forced steps are described by a branching process. 
Our main goal will be to show that this branching process w.h.p. remains 
subcritical throughout the algorithm for sufficiently small r, so that the 
number of variables set in any round will be 0(1) w.h.p. 

To analyze our algorithm we need to track the change in the number of 
2-clauses and 3-clauses in each round. Note that at the start of the algorithm 
we have no 2-clauses, it can be shown that after o(n) free steps, the number 
of 2-clauses is w.h.p. positive, and returns to zero only after 0(n) free steps. 
This can be proved similar to lemma 3 in [13J, by showing that the expected 
number of 2-clauses is positive after o(n) steps. As will be shown later, 
once the 2-clauses are exhausted the remaining 3-clauses form a very sparse 
formula, which can easily be satisfied. Therefore, we focus on the phase of 
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while there are any unset variables, do { 
// Free step. 

if there are any 2-clauses 

choose a clause c at random from the 2-clauses 

else 

choose a clause c at random from the 3-clauses 
choose a variable a; € c at random 
set X = TRUE 
// Forced steps. 

while there are unit clauses, satisfy them; 



tlie algorithm when w.h.p. 2-clauses exist, in which case the free step always 
sets a variable in a 2-clause. 

In what follows we describe the branching process corresponding to the 
forced steps. We then analyze the expected effect of each round, and give a 
set of differential equations that describe the "trajectory" of the algorithm. 
Finally, we solve these differential equations and show that for r < 0.546 
the branching process remains subcritical. 

Let n be the number of variables in the formula. Let m = rn he the 
number of clauses. Let T = t ■ n he the number of rounds completed so far. 
For i = 2, 3 let Si{T) = Sj(t) ■ n be the number of clauses of length i. Let 
X{T) = x{t) ■ n he the number of variables set so far. Let mT^rnp he the 
expected number of variables set to true, false respectively in each round 
(inclusive of the variable set in the free step) . 



We compute mj^, mj? according to a two- type branching process as in |16j. 

The two types here are positive and negative unit clauses. In the free step 
we set a variable in a 2-clause to true and this forces us to set the other 
variable in the 2-clause to false. Thus the initial expected population of 
unit clauses can be represented by a vector 



where the first and second components count the positive and negative unit 
clauses respectively. 



} 



Table 1: Our Algorithm - SC. 






4 



We wish to determine the transition matrix of the branching process. 
If X variables have been set so far, the probabihty of a variable appearing 
in a given z-clause is i/{n — X). So, setting a variable to true, i.e., sat- 
isfying a positive unit clause, creates, in expectation, (653 + 252)/(n — X) 
negative unit clauses. Similarly, satisfying a negative unit clause creates, in 
expectation, 2S2/{n — X) positive unit clauses. Thus, we have the following 
transition matrix M for the branching process: 

M - 1 ( ^ + 2^2 \ _ _i_ / 6S3 + 2S2 \ 

n-X\2S2 )~l-x\2s2 j'^^' 

As long as the largest eigenvalue Ai of M is less than 1, the expected number 
of variables set to true or false in each round is given by the geometric series 

2^ )={I + M + M' + ...).po = {I-Mr'.po (3) 

where / is the identity matrix. Moreover, as long as Ai < 1 throughout 
the algorithm, i.e., as long as the branching process is subcritical for all x, 
my and mp remain 0(1) and, as in [111 I16j . our algorithm succeeds with 
positive probability. On the other hand, if Ai ever exceeds 1, then the 
branching process becomes supercritical, the unit clauses proliferate with 
high probability and the algorithm fails. Note that 



Al = \/s2(s2+3S3) (4) 

1 — X 

Our next step is to write down the expected change in 5*2, 5*3 and X in a 
given round as a function of their values at the beginning of the round. We 
define A/(T) = /(T + 1) - /(T). Then: 

E\^X(T)\ = mT + mp (5) 

E[ASs{T)] = -{mT + mF)^% + o{l) (6) 

n — X 

E[AS2(,T)] = mF^% - {ruT + mp) ^^^^ - 1 + o(l) (7) 
n — X n — X 

To see this, recall that in expectation we set mx + mp variables inclusive 
of the variable chosen on the free step, giving Any variable set during 
a round appears in 35'3/(n — X) 3-clauses and 252/(n — X) 2-clauses, in 
expectation; these clauses are removed, giving ([6]) and first negative term 
in ([7|), the o(l) terms absorb the probability that a given clause is "hit" 
twice during a round. Among the 3-clauses, those that had a variable set 
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to false become 2-clauses, giving the positive term in ([7|). Finally, the — 1 
in d?]) comes from the fact that SC chooses a random 2-clause and removes 
it on the free step. 

Wormald's Theorem [6j allows us to rescale ([5]), ([6]), and d?]) to form a 
system of differential equations for Si{x). The random variables Si{xn) will 
then be w.h.p. within o(n) of Si{x) ■ n for all x, where Si{x) are the solutions 
to these equations. By changing the variable of integration to x and ignoring 
o(l) terms, we transform these equations to the following simpler form: 

dS3 3S3 



The initial conditions are 53(0) = r, 52(0) = 0, even though as in [TT] the 
differential equations trace the evolution of S2 and S3 after a o(l) fraction 
of the variables have been set. 

When r = 0.546, numerically solving the differential equations gives us, 
at X ~ 0.29, maXa;(Ai) ~ 0.996 < 1, so the branching process remains 
subcritical. At x !^ 0.79, the density of the 2-clauses becomes S2{x) = 0. 
This means the algorithm succeeds with positive probability in exhausting all 
the 2-clauses. The density of the remaining 3-clauses is ss{x)/{l — x) ~ 0.02. 
For ECS formulas with such low densities, the graph of clause to variable 
connectivity (i.e., the graph in which clauses are nodes and clauses that have 
a variable in common have an edge between them) with positive probability 
consists of trees only (and in the terminology of [5j the formula has no 
"core"). The formula can then be satisfied by repeatedly satisfying variables 
on the leaves of these trees. As a result, the algorithm succeeds with positive 
probability whenever r < 0.546, completing the proof of Theorem [H 

We analyzed two other kinds of free steps, but they gave weaker bounds. 
Setting a random variable true gives r* > 0.5097, and choosing a random 3- 
clause and setting one of its variables true gives r* > 0.5386. Probabilistic 
mixes of these steps with SC also appear to give weaker bounds. 

3 Numerical experiments 

We conclude with our own numerical experiments. For each value of r 
and n we performed 10^ trials, each of which consisted of creating a random 
EC3 formula and checking whether it is satisfiable or not using the 3-SAT 
solver Satz |17j . The fraction of these which are satisfiable, as a function 



dx 

dS2 

dx 



1 — X 

nip 3s3 2s2 1 



(8) 



(rriT + rnp) ^ — X 1 — x rriT + mp 



(9) 
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of r for various values of n, is shown in Figure [TJ Using the place where 
these curves cross as our estimate of the threshold (a common technique in 
finite-size scaling) suggests that r* ki 0.62. 




Figure 1: The probability of satisfiability as a function of r for n = 
300, 400, 500 and 600. 



4 Conclusion 

We have placed a lower bound of r* > 0.546 on the threshold of the phase- 
transition in ECS. Combined with the upper bound of r* < 0.644 [5], a 
fairly small gap of 0.098 remains. It might be possible to improve our lower 
bound using algorithms that choose a variable based on the number of its 
occurrences in the remaining formula, as in [16^ [18] . 
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